Search CORE

579 research outputs found

Mining Mid-level Features for Action Recognition Based on Effective Skeleton Representation

Author: Gao Zhimin
Li Wanqing
Ogunbona Philip
Wang Pichao
Zhang Hanling
Publication venue
Publication date: 01/01/2014
Field of study

Recently, mid-level features have shown promising performance in computer vision. Mid-level features learned by incorporating class-level information are potentially more discriminative than traditional low-level local features. In this paper, an effective method is proposed to extract mid-level features from Kinect skeletons for 3D human action recognition. Firstly, the orientations of limbs connected by two skeleton joints are computed and each orientation is encoded into one of the 27 states indicating the spatial relationship of the joints. Secondly, limbs are combined into parts and the limb's states are mapped into part states. Finally, frequent pattern mining is employed to mine the most frequent and relevant (discriminative, representative and non-redundant) states of parts in continuous several frames. These parts are referred to as Frequent Local Parts or FLPs. The FLPs allow us to build powerful bag-of-FLP-based action representation. This new representation yields state-of-the-art results on MSR DailyActivity3D and MSR ActionPairs3D

arXiv.org e-Print Archive

Crossref

Research Online

Large-scale Continuous Gesture Recognition Using Convolutional Neural Networks

Author: Gao Zhimin
Li Wanqing
Liu Song
Ogunbona Philip
Wang Pichao
Zhang Yuyao
Publication venue
Publication date: 01/01/2016
Field of study

This paper addresses the problem of continuous gesture recognition from sequences of depth maps using convolutional neutral networks (ConvNets). The proposed method first segments individual gestures from a depth sequence based on quantity of movement (QOM). For each segmented gesture, an Improved Depth Motion Map (IDMM), which converts the depth sequence into one image, is constructed and fed to a ConvNet for recognition. The IDMM effectively encodes both spatial and temporal information and allows the fine-tuning with existing ConvNet models for classification without introducing millions of parameters to learn. The proposed method is evaluated on the Large-scale Continuous Gesture Recognition of the ChaLearn Looking at People (LAP) challenge 2016. It achieved the performance of 0.2655 (Mean Jaccard Index) and ranked

3^{rd}

place in this challenge

arXiv.org e-Print Archive

Crossref

Research Online

Large-scale Isolated Gesture Recognition Using Convolutional Neural Networks

Author: Gao Zhimin
Li Wanqing
Liu Song
Ogunbona Philip
Tang Chang
Wang Pichao
Publication venue
Publication date: 01/01/2016
Field of study

This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI). These dynamic images are constructed from a sequence of depth maps using bidirectional rank pooling to effectively capture the spatial-temporal information. Such image-based representations enable us to fine-tune the existing ConvNets models trained on image data for classification of depth sequences, without introducing large parameters to learn. Upon the proposed representations, a convolutional Neural networks (ConvNets) based method is developed for gesture recognition and evaluated on the Large-scale Isolated Gesture Recognition at the ChaLearn Looking at People (LAP) challenge 2016. The method achieved 55.57\% classification accuracy and ranked

2^{nd}

place in this challenge but was very close to the best performance even though we only used depth data.Comment: arXiv admin note: text overlap with arXiv:1608.0633

arXiv.org e-Print Archive

Crossref

Research Online

Dual Long Short-Term Memory Networks for Sub-Character Representation Learning

Author: Feng Yi
Gao Zhimin
He Han
Townsend George
Wu Lei
Yan Hua
Yang Xiaokun
Publication venue
Publication date: 01/01/2018
Field of study

Characters have commonly been regarded as the minimal processing unit in Natural Language Processing (NLP). But many non-latin languages have hieroglyphic writing systems, involving a big alphabet with thousands or millions of characters. Each character is composed of even smaller parts, which are often ignored by the previous work. In this paper, we propose a novel architecture employing two stacked Long Short-Term Memory Networks (LSTMs) to learn sub-character level representation and capture deeper level of semantic meanings. To build a concrete study and substantiate the efficiency of our neural architecture, we take Chinese Word Segmentation as a research case example. Among those languages, Chinese is a typical case, for which every character contains several components called radicals. Our networks employ a shared radical level embedding to solve both Simplified and Traditional Chinese Word Segmentation, without extra Traditional to Simplified Chinese conversion, in such a highly end-to-end way the word segmentation can be significantly simplified compared to the previous work. Radical level embeddings can also capture deeper semantic meaning below character level and improve the system performance of learning. By tying radical and character embeddings together, the parameter count is reduced whereas semantic knowledge is shared and transferred between two levels, boosting the performance largely. On 3 out of 4 Bakeoff 2005 datasets, our method surpassed state-of-the-art results by up to 0.4%. Our results are reproducible, source codes and corpora are available on GitHub.Comment: Accepted & forthcoming at ITNG-201

arXiv.org e-Print Archive

Crossref

On-chip spectroscopy with thermally-tuned high-Q photonic crystal cavities

Author: Andreas C. Liapis
Boshen Gao
Herzberg G.
Mahmudur R. Siddiqui
Robert W. Boyd
Zhimin Shi
Publication venue: 'AIP Publishing'
Publication date: 03/11/2015
Field of study

Spectroscopic methods are a sensitive way to determine the chemical composition of potentially hazardous materials. Here, we demonstrate that thermally-tuned high-Q photonic crystal cavities can be used as a compact high-resolution on-chip spectrometer. We have used such a chip-scale spectrometer to measure the absorption spectra of both acetylene and hydrogen cyanide in the 1550 nm spectral band, and show that we can discriminate between the two chemical species even though the two materials have spectral features in the same spectral region. Our results pave the way for the development of chip-size chemical sensors that can detect toxic substances

arXiv.org e-Print Archive

USFSP Digital Archive

Crossref

Scholar Commons - University of South Florida

Depth Pooling Based Large-scale 3D Action Recognition with Convolutional Neural Networks

Author: Gao Zhimin
Li Wanqing
Ogunbona Philip
Tang Chang
Wang Pichao
Publication venue
Publication date: 01/01/2018
Field of study

This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI), for both isolated and continuous action recognition. These dynamic images are constructed from a segmented sequence of depth maps using hierarchical bidirectional rank pooling to effectively capture the spatial-temporal information. Specifically, DDI exploits the dynamics of postures over time and DDNI and DDMNI exploit the 3D structural information captured by depth maps. Upon the proposed representations, a ConvNet based method is developed for action recognition. The image-based representations enable us to fine-tune the existing Convolutional Neural Network (ConvNet) models trained on image data without training a large number of parameters from scratch. The proposed method achieved the state-of-art results on three large datasets, namely, the Large-scale Continuous Gesture Recognition Dataset (means Jaccard index 0.4109), the Large-scale Isolated Gesture Recognition Dataset (59.21%), and the NTU RGB+D Dataset (87.08% cross-subject and 84.22% cross-view) even though only the depth modality was used.Comment: arXiv admin note: text overlap with arXiv:1701.01814, arXiv:1608.0633

arXiv.org e-Print Archive

Research Online

Effects of doping in 25-atom bimetallic nanocluster catalysts for carbon–carbon coupling reaction of iodoanisole and phenylacetylene

Author: Li Gao
Li Zhimin
Liu Chao
Wang Jin
Yang Xiujuan
Publication venue: Chinese Materials Research Society. Published by Elsevier B.V.
Publication date: 01/10/2016
Field of study

AbstractWe here report the catalytic effects of foreign atoms (Cu, Ag, and Pt) doped into well-defined 25-gold-atom nanoclusters. Using the carbon-carbon coupling reaction of p-iodoanisole and phenylacetylene as a model reaction, the gold-based bimetallic MxAu25−x(SR)18 (–SR=–SCH2CH2Ph) nanoclusters (supported on titania) were found to exhibit distinct effects on the conversion of p-iodoanisole as well as the selectivity for the Sonogashira cross-coupling product, 1-methoxy-4-(2-phenylethynyl)benzene). Compared to Au25(SR)18, the centrally doped Pt1Au24(SR)18 causes a drop in catalytic activity but with the selectivity retained, while the AgxAu25−x(SR)18 nanoclusters gave an overall performance comparable to Au25(SR)18. Interestingly, CuxAu25−x(SR)18 nanoclusters prefer the Ullmann homo-coupling pathway and give rise to product 4,4′-dimethoxy-1,1′-biphenyl, which is in opposite to the other three nanocluster catalysts. Our overall conclusion is that the conversion of p-iodoanisole is largely affected by the electronic effect in the bimetallic nanoclusters’ 13-atom core (i.e., Pt1Au12, CuxAu13−x, and Au13, with the exception of Ag doping), and that the selectivity is primarily determined by the type of atoms on the MxAu12−x shell (M=Ag, Cu, and Au) in the nanocluster catalysts

Elsevier - Publisher Connector

Directory of Open Access Journals

Institutional Repository of Dalian Institute of Chemical Physics, CAS